Estimating Speaker Clustering Quality Using Logistic Regression
نویسندگان
چکیده
This paper focuses on estimating clustering validity by using logistic regression. For many applications it might be important to estimate the quality of the clustering, e.g. in case of speech segments’ clustering, make a decision whether to use the clustered data for speaker verification. In the case of short segments speakers clustering, the common criteria for cluster validity are average cluster purity (ACP), average speaker purity (ASP) and K the geometric mean between the two measures. As in practice, true labels are not available for evaluation, hence they have to be estimated from the clustering itself. In this paper, meanshift clustering with PLDA score is applied in order to cluster short speaker segments represented as i-vectors. Different statistical parameters are then estimated on the clustered data and are used to train logistic regression to estimate ACP, ASP and K. It was found that logistic regression can be a good predictor of the actual ACP, ASP and K, and yields reasonable information regarding the clustering quality.
منابع مشابه
Hierarchical speaker clustering methods for the NIST i-vector Challenge
The process of manually labeling data is very expensive and sometimes infeasible due to privacy and security issues. This paper investigates the use of two algorithms for clustering unlabeled training i-vectors. This aims at improving speaker recognition performance by using state-of-the-art supervised techniques in the context of the NIST i-vector Machine Learning Challenge 2014. The first alg...
متن کاملPartitioning of Two-Speaker Conversation Datasets
We address the speaker partitioning problem on datasets composed of two-speaker conversations. In such a situation, it is desirable to obtain a good overall diarization performance but even in that case, the performance of the partitioning problem can be severely degraded if some of the recordings are incorrectly segmented. We show that the performance of a bottom-up speaker clustering approach...
متن کاملPredicting Customer Churn Using CLV in Insurance Industry
Today, increased level of customer awareness caused themto access to the other suppliers easily and they can get their servicesfrom the competitors with similar or even better quality and same price.Therefore, focusing on customers and preventing them to leave, has beenthe most important strategy for any company. Researches have shownthat retaining former customers is cheaper than attracting ne...
متن کاملSpeaker recognition with penalized logistic regression machines
「罰金付きロジスティック回帰マシンを用いた話者認識」, ビルケネス・オイスティン(ノル ウェー工科大学),松井知子(統数研) Abstract We study on speaker recognition using a penalized logistic regression machine (PLRM) [1-3]. Parameters of a multiclass logistic regression model with the log-likelihood values of speaker Gaussian mixture models (GMMs) are discriminatively estimated and the model used for speaker decision. In speaker identification experimen...
متن کاملEstimating the Concrete Compressive Strength Using Hard Clustering and Fuzzy Clustering Based Regression Techniques
Understanding of the compressive strength of concrete is important for activities like construction arrangement, prestressing operations, and proportioning new mixtures and for the quality assurance. Regression techniques are most widely used for prediction tasks where relationship between the independent variables and dependent (prediction) variable is identified. The accuracy of the regressio...
متن کامل